UMass at TREC 2004: Novelty and HARD

نویسندگان

  • Nasreen Abdul Jaleel
  • James Allan
  • W. Bruce Croft
  • Fernando Diaz
  • Leah S. Larkey
  • Xiaoyan Li
  • Mark D. Smucker
  • Courtney Wade
چکیده

For the TREC 2004 Novelty track, UMass participated in all four tasks. Although finding relevant sentences was harder this year than last, we continue to show marked improvements over the baseline of calling all sentences relevant, with a variant of tfidf being the most successful approach. We achieve 5–9% improvements over the baseline in locating novel sentences, primarily by looking at the similarity of a sentence to earlier sentences and focusing on named entities. For the High Accuracy Retrieval from Documents (HARD) track, we investigated the use of clarification forms, fixedand variable-length passage retrieval, and the use of metadata. Clarification form results indicate that passage level feedback can provide improvements comparable to user supplied related-text for document evaluation and outperforms related-text for passage evaluation. Document retrieval methods without a query expansion component show the most gains from related-text. We also found that displaying the top passages for feedback outperformed displaying centroid passages. Named entity feedback resulted in mixed performance. Our primary findings for passage retrieval are that document retrieval methods performed better than passage retrieval methods on the passage evaluation metric of binary preference at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method explored. We found no benefit to using variable-length passages over fixed-length passages for this corpus. Our use of geography and genre metadata resulted in no significant changes in retrieval performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UMass at TREC 2002: Cross Language and Novelty Tracks

The University of Massachusetts participated in the cross-language and novelty tracks this year. The cross-language submission was characterized by combination of evidence to merge results from two different retrieval engines and a variety of different resources – stemmers, dictionaries, machine translation, and an acronym database. We found that proper names were extremely important in this ye...

متن کامل

NLPR at TREC 2005: HARD Experiments

1 Overview It is the third time that Chinese Information Processing Group of NLPR takes part in TREC. In the past, we participated in Novelty track and Robust track, in which we had evaluated our two key notions: Window-based Retrieval Algorithm and Result Emerging Strategy [1][2]. This year we focus on investigating the significance of relevance feedback, so HARD track is our best choice. HARD...

متن کامل

Improved Feature Selection and Redundance Computing - THUIR at TREC 2004 Novelty Track

This is the third years that Tsinghua University Information Retrieval Group (THUIR) participates in Novelty task of TREC. Our research on this year’s novelty track mainly focused on four aspects: (1) text feature selection and reduction; (2) improved sentence classification in finding relevant information; (3)efficient sentence redundancy computing; (4) effective result filtering. All experime...

متن کامل

Graph-Based Text Representation For Novelty Detection

We discuss several feature sets for novelty detection at the sentence level, using the data and procedure established in task 2 of the TREC 2004 novelty track. In particular, we investigate feature sets derived from graph representations of sentences and sets of sentences. We show that a highly connected graph produced by using sentence-level term distances and pointwise mutual information can ...

متن کامل

ISI Novelty Track System for TREC 2004

We describe our system developed at ISI for the Novelty track at TREC 2004. The system’s two modules recognize relevant event and opinion sentences respectively. We focused mainly on recognizing relevant opinion sentences using various opinionbearing word lists. Of our 5 runs submitted for task 1, the best run provided an F-score of 0.390 (precision 0.30 and recall 0.71).

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004